The existence of significant immunoediting signal

From our pan-cancer data, we can get two distributions: mutation counts of samples and IC50 values of neoantigens. We randomly selected 10000 counts from the distribution of mutation counts to construct simulated samples which each sample has its own mutation counts. For each mutations in a simulated sample, we randomly selected a IC50 value from IC50 distribution. Now we have 10000 samples, each sample has its mutation counts and each mutations has its IC50 values. Given that CCF or expression is a criterion for ranking, we just used the natural order to rank the simulated mutations. At last, we calculated NES for these simulated samples to get neutral distribution of NES.

The following code do this simulation step for each cancer type:

pancancer_neoantigen_exp_ccf <- readRDS("~/Immunoediting/data/pancancer_neoantigen_exp_ccf.rds")
pancancer_neoantigen_exp_ccf <- pancancer_neoantigen_exp_ccf %>%
  rename(sample=Tumor_Barcode,chromosome=chr,position=Stop,MT_mean=MT_har_mean) %>% 
  as.data.frame(stringsAsFactors=FALSE)
cancer_type <- names(table(pancancer_neoantigen_exp_ccf$cancer_type))
set.seed(202011104)
for (i in 1:length(cancer_type)){
  cancer <- pancancer_neoantigen_exp_ccf %>% 
    dplyr::filter(cancer_type==.env$cancer_type[i])
  cancer %>% group_by(sample) %>%
    summarise(mutation_counts=n()) -> mutation_counts
  ##sampling mutation counts
  sample_mt <- sample(mutation_counts$mutation_counts,10000,replace = T)
  
  sample <- vector("list",length = 10000)
  names(sample) <- c(1:10000)
  sample <- lapply(sample_mt,
                    function(x){
                      tmp <- data.frame(mutation_id=c(1:x),
                                        IC50=sample(cancer$MT_mean,x),##sample IC50 for mutations
                                        rank=c(x:1))
                      a <- nrow(tmp)
                      tmp <- tmp %>% 
                        mutate(rank=abs((a/2)-rank)+1)
                      return(tmp)
                    })
  
  result <- vector("list",length = 10000)
  names(result) <- c(1:10000)
  cl <- makeCluster(getOption("cl.cores", 12),type="FORK")
  result <- parLapply(cl=cl,sample,
                          function(x){
                            neoantigen_list <- x %>% 
                              filter(IC50<500) %>% 
                              select(mutation_id)
                            neoantigen_list <- neoantigen_list[[1]]
                            if(length(neoantigen_list)==0){
                              return(NA)
                            }else{
                              es <- NeoEnrichment::cales_simulation(x,neoantigen_list)
                              nes <- NeoEnrichment::cal_nes_simulation(es,neoantigen_list,x)
                              return(nes)
                            }
                          })
  stopCluster(cl)
  
  result <- Filter(function(x){length(x)>1},result)
  result_sim <- bind_rows(result)
  saveRDS(result_sim,
          file = paste0("~/Immunoediting/data/simulation/",cancer_type[i],"_simulation.rds"))
}

Then we bind all cancer type simulation data:

This simulation can considered as neutral (or random) nes distribution, so we can compare our TCGA pancancer real nes distribution to this data. If the two distributions are significantly different, then we can conclude that the observed nes in TCGA real data is not random.

From the pancancer overview, It has significant difference between neutral simulation and TCGA real data.

Then we can check how many patients have significant neoantigen negative enrichment signal.

Thus we conclude that the immunoediting signal is undoubtedly exist, however in majority of cancer patient, this signal is undetectable, and only 2.5% (CCF FDR<0.25) of patient show this significant immunoediting signal.

Then we view the cancer type distribution of NES and compare with cancer type simulation:

Neoantigen enrichment score is associated with immune cell infiltration

NES reflects neoantigen-mediated immune selection, thus we want to check relation between the immune cell infiltration level and our NES score.

Firstly, we calculate correlation using spearman method by cancer types

We can see the negative corrlation between nes and most immune score. Plot LF:

We can see the CCF NES is significantly corrlated with LF, while EXP NES is not.

The negative NES value indicates that samples are under immune negative selection, thus we then consider NES as categorical variable and check the difference between positive and negative NES.

In the categorical aspect, negative and positive show significant difference in those immune scores.

Neoantigen enrichment score and immune negative selection

We use a stochastic branching process model (2020 Nature genetics, Eszter Lakatos et.al) to model tumor growth integrating neoantigen-mediated negative selection. The detailed method please see our paper or the origin NG paper

The Julia scripts used in this simulation can be found in code/julia.

The R scripts used to calculate NES of simulation data can be found in code/R/cal_simulate_es.R

We can check the relationship between selection coefficient with NES:

Plot

For each fixed selection coefficient, the normalized neoantigen enrichment score was calculated, and the resulting the resulting neoantigen enrichment score show near linear correlation with s values. This analysis suggests neoantigen enrichment score can be used to infer the immune selection strength in patient.

TCGA Pancancer survial analysis

The enrichment score for CCF distribution reflect the status of immune-elimination step. In cancer patient with strong immune-elimination, good prognosis should be expected. Thus we devided patients into two groups based on NES values, and compare their survial probability.

We can see patients with ESccf values (NES<0; FDR<0.25) show improved prognosis compared to remain patients, while expression not.

The patients with NES<0 and FDR<0.25 is small cohort (209 patients), so we could ask this difference is actually caused by NES(ccf) difference or by cancer type difference? (ie. those 209 patients are mainly several cancertype, and those cancer type has better survial than others). Then we do cox multivariate analysis that take cancer type and immune infiltration into account:

So, when we consider cancer type and immune infiltration, the HR is still more than 1 significantly.

In the Neoantigen ebrichment score is associated with immune cell infiltration part, we show that NES(ccf) is corrlated with some immune scores. So we want to explore whether prognosis is different between immune hot and cold tumors or not.

We foucs on CD8 T cells. Let’s look at the distribution of CD8 T cells between cancer types:

Then we do survial analysis for hot tumor and cold tumor.

This analysis show that both hot and cold cancer has significantly different prognosis, while both of exp shows no difference. This is because the enrichment score for mRNA expression distribution partially reflect the status of immune-escape, thus we explore other escape mechanisms, including:

  • Suppress the transcription/expression of genome alterations encoding high antigenicity (quantified by NES(EXP))
  • Antigen presentation pathway gene mutation
  • Up-regulate the expression of immune suppressive signaling, like PD-L1, CTLA-4

In the Process immune escape sample part, we has already find escape samples. Then we can look at the prognosis of escape samples VS no escape samples:

pancancer_exp <- readRDS("~/Immunoediting/data/pancancer_nes_exp.rds") %>%
  mutate(p_adj=p.adjust(p_value,method = "fdr")) %>%
  mutate(es_type=case_when(
    nes < 0 & p_adj < 0.25 ~ "sig_negative",
    nes < 0 & p_adj >= 0.25 ~ "negative but not sig",
    nes >= 0 ~ "positive"
  ))
pancancer_exp <- left_join(pancancer_exp %>%
                                      mutate(sample=substr(sample,1,12)),
                                    pancancer_survial,by="sample") %>% 
  filter(OS.time<4700)

pancancer_exp <- pancancer_exp %>% filter(!(is.na(apm_mut) 
                                            | is.na(exp_escape) 
                                            | is.na(checkpoint_overexpression)))
pancancer_exp$escape <- mapply(function(x,y,z)
  {ifelse(any(c(x,y,z)=="yes",na.rm = T),"yes","no")},
  pancancer_exp$apm_mut,
  pancancer_exp$checkpoint_overexpression,
  pancancer_exp$exp_escape)

pancancer_exp$cancer <- gsub("TCGA-","",getcancer_type(pancancer_exp$sample))
fit6 <- survfit(Surv(OS.time,OS)~escape,data = pancancer_exp)
p7 <- ggsurvplot(fit6,pval = T,risk.table = T,data = pancancer_exp,
                  tables.theme = theme_cleantable(),  # Clean risk table
                  palette =c("black","red"),
                  size=2,title="Escape of all samples", fun = "pct",
                 legend = c(0.8, 0.9), legend.title = element_blank())
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> Results may be unexpected or may change in future versions of ggplot2.

hot <- pancancer_exp %>% filter(cancer %in% hot_cancer)
fit7 <- survfit(Surv(OS.time,OS)~escape,data = hot)
p8 <- ggsurvplot(fit7,pval = T,risk.table = T,data = hot,
                  tables.theme = theme_cleantable(),  # Clean risk table
                  palette =c("black","red"),
                  size=2,title="Escape of hot cancers", fun = "pct",
                 legend = c(0.8, 0.9), legend.title = element_blank())
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> Results may be unexpected or may change in future versions of ggplot2.

cold <- pancancer_exp %>% filter(cancer %in% cold_cancer)
fit8 <- survfit(Surv(OS.time,OS)~escape,data = cold)
p9 <- ggsurvplot(fit8,pval = T,risk.table = T,data = cold,
                  tables.theme = theme_cleantable(),  # Clean risk table
                  palette =c("black","red"),
                  size=2,title="Escape of cold cancers", fun = "pct",
                 legend = c(0.8, 0.9), legend.title = element_blank())
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> Results may be unexpected or may change in future versions of ggplot2.

p7

We can see, in the hot cancers, no escape samples have a better prognosis while in the cold cancers, there are no difference.